Skip to content

Fix stdio client shutdown bugs and rebuild the stdio test suite#2773

Merged
maxisbey merged 16 commits into
mainfrom
maxisbey/deflake-stdio-tests
Jun 5, 2026
Merged

Fix stdio client shutdown bugs and rebuild the stdio test suite#2773
maxisbey merged 16 commits into
mainfrom
maxisbey/deflake-stdio-tests

Conversation

@maxisbey
Copy link
Copy Markdown
Contributor

@maxisbey maxisbey commented Jun 3, 2026

The stdio test suite has a long flake history (#1775, #1158, #559, #401). Auditing it surfaced real bugs in the client transport's shutdown path. This PR fixes the transport and rebuilds the stdio tests so they are deterministic.

Motivation and Context

Shutdown bugs fixed (each has a regression test that fails on main):

  1. Cancelling stdio_client could deadlock shutdown forever.
    • Cleanup was not shielded, so a cancelled client fell into anyio's Process.aclose(), which waits for all pipes to close — and a server with a child of its own (npx ..., uv run ...) holds the pipes open indefinitely.
  2. Tree-kill silently did nothing when the server process had already died.
    • os.getpgid() raises once the leader is reaped, leaking every surviving child. The group is now killed via the spawn-time pgid.
  3. "Did the server exit?" actually checked "did all pipes close?".
    • process.wait() is pipe-gated on asyncio 3.11+, so a server that exited promptly but left a background child running was misclassified as hung and its whole tree killed. The wait now keys on returncode, which reflects process death alone.
  4. The Windows fallback's wait() could not be cancelled.
    • It ran in a non-cancellable worker thread, so the timeout around it could never fire and escalation was unreachable.
  5. I/O on a dead server leaked raw backend exceptions.
  6. A killed process was never reaped.
    • The unclosed subprocess transport leaked a ResourceWarning into whatever ran next.

The whole sequence now lives in one cancellation-shielded function with named timeouts: close stdin → grace period keyed on process death → terminate tree → bounded wait → kill tree. See docs/migration.md.

Test rebuild:

  • Transport logic (framing, parse errors, escalation timing, shutdown edges) now runs in process against a small fake process. No sleeps, no polling; escalation timing is pinned on trio's virtual clock.
  • Real subprocesses remain only where the OS itself is under test: group kill semantics (including a dead leader), SIGKILL after an ignored SIGTERM, exec failure. 5 tests, down from 14.
  • New tests/transports/stdio/ suite pins the end-to-end lifecycle on both platforms: clean exit without escalation, cancellation kills the whole tree, stderr routing, and the child-survival policy.
  • Deleted the tee-based tests and test_1027_win_unreachable_cleanup.py (their properties are re-pinned above). test_552_windows_hang.py now asserts real outcomes instead of swallowing exceptions.

How Has This Been Tested?

  • 100% line+branch coverage; the stdio surface runs in ~5s (was ~10.8s)
  • Every fix verified two ways: its regression test fails on main, and the OS-level claims were verified empirically on asyncio and trio
  • 600 clean full-suite runs on GitHub-hosted runners (300 Windows / 300 Ubuntu, Python 3.10–3.14), plus stress runs under xdist and CPU saturation: zero failures, zero orphan processes

Breaking Changes

  • terminate_windows_process (deprecated) is removed; terminate_windows_process_tree no longer takes the unused timeout_seconds.
  • POSIX: a server that exits cleanly no longer has its surviving background children killed (the old kill was a side effect of bug 3). A server that ignores shutdown still gets its whole tree killed.
  • Windows: surviving job members are reaped deterministically at shutdown rather than at GC time.

Details in docs/migration.md.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Two known limitations are documented in code rather than fixed:

  • The shielded shutdown holds against anyio-level cancellation, but a native task.cancel() delivered mid-cleanup can still abort it (no backend-neutral way to refuse native cancellation).
  • On Windows, children spawned before the Job Object assignment completes are outside the job (pre-existing; a CREATE_SUSPENDED spawn is the follow-up if it bites).

AI Disclaimer

…ignals

The stdio cleanup tests asserted upper bounds on elapsed wall-clock time
(cleanup took < 6.0s, < 5.0s, < 3.0s, between 1.5s and 4.5s) and paced
themselves with fixed sleeps ("let the process start"). Upper bounds on
elapsed time fail on slow runners — spawning an interpreter on a loaded
CI box can eat the whole margin — and the sleeps raced process startup.

Every cleanup test now uses the socket-based liveness probe the
process-tree tests already had: the child connects back to a listener
owned by the test (deterministic readiness, replacing the sleeps; setup
lines such as installing a signal handler are guaranteed to have run
before the connect), and the kernel closing that socket when the
process dies is the proof of termination (replacing the elapsed-time
upper bounds). Time bounds that remain are generous fail_after hang
guards, not performance claims, plus one deliberate lower bound:
test_stdio_client_stdin_close_ignored still asserts cleanup takes at
least PROCESS_TERMINATION_TIMEOUT, which pins the genuine time-based
contract that escalation must wait out the stdin-close grace period.
Lower bounds cannot fail from runner slowness.

Details worth knowing:

- test_stdio_client_sigint_only_process is renamed to
  test_stdio_client_sigterm_ignoring_process: the cleanup path never
  sends SIGINT (it escalates stdin-close -> SIGTERM -> SIGKILL), so the
  old child's SIGINT handler never fired and SIGKILL was what actually
  killed it. The test now states what it proves. Its whole-function
  "pragma: lax no cover" stays, with the real reason documented:
  coverage is enforced per CI job at 100% including on Windows runners,
  where the test is skipped and its body would count as uncovered.
- test_stdio_client_graceful_stdin_exit now proves "no signal was
  needed" directly: the child sends a marker over the socket only on
  the stdin-EOF exit path before exiting on its own. The old upper
  bound (< 3.0s) could not distinguish a stdin-driven exit from a
  SIGTERM-driven one, which also completes in about 2.2s.
- TestChildProcessCleanup's three tests become top-level functions per
  the no-test-classes rule; bodies unchanged.
- The defensive except (TimeoutError, Exception) block and the
  pragma'd cancelled_caught/pytest.fail arms are gone: hang guards are
  plain anyio.fail_after, whose TimeoutError fails the test with no
  dead lines to exclude from coverage.
- PROCESS_TERMINATION_TIMEOUT is intentionally not monkeypatched
  anywhere: the 2s grace period is the contract under test in the
  lower-bound test, and everywhere else the tests no longer measure
  time at all, so shrinking it would only save ~6s of real escalation
  waiting at the cost of testing a constant production never uses.
@maxisbey maxisbey force-pushed the maxisbey/deflake-stdio-tests branch 2 times, most recently from 9226789 to e045ebb Compare June 3, 2026 15:38
Comment thread src/mcp/os/win32/utilities.py Outdated
Comment thread tests/client/test_stdio.py Outdated
Comment thread tests/issues/test_1027_win_unreachable_cleanup.py
@maxisbey maxisbey force-pushed the maxisbey/deflake-stdio-tests branch 2 times, most recently from 64b8859 to 5067c98 Compare June 3, 2026 19:13
maxisbey added 2 commits June 3, 2026 19:31
…spurious kills

Four user-facing bugs in the stdio client's shutdown path, all rooted in
the same design problem — the shutdown sequence depended on pipe state
and was not protected from cancellation:

- Cancelling stdio_client (a client timeout, app shutdown) skipped the
  entire shutdown sequence: the first await in the finally block
  re-raised the pending cancellation, so stdin was never closed
  gracefully and the process tree was never terminated. Control then
  fell through to anyio's Process.aclose, whose shielded wait only
  returns once every pipe has closed - and a pipe inherited by a
  grandchild (npx- and uv-style servers always have one) never closes,
  so the client deadlocked forever. The shutdown now runs inside a
  shielded cancel scope with every wait time-bounded, and never relies
  on a pipe-gated wait.

- The grace-period check used process.wait(), which on the asyncio
  backend resolves only when the process has exited AND its pipes have
  closed. A well-behaved server that exited instantly on stdin closure
  but left a background child holding stdout was misclassified as hung,
  burned the full grace period, and got its tree terminated with a
  spurious warning. The wait now polls returncode, which reflects
  process death alone.

- Process-tree termination derived the group ID with os.getpgid(pid),
  which fails once the leader has been reaped even while its
  descendants are alive — and the fallback then "terminated" the dead
  leader, leaking every descendant. Since the process is spawned with
  start_new_session=True, the pgid is by definition the leader's pid;
  use it directly, and treat ProcessLookupError from killpg as "group
  already gone" rather than a reason to fall back.

- Writing to a dead server's stdin surfaced a raw backend exception
  (ConnectionResetError inside an exception group) to the caller
  instead of a clean closed-stream signal. stdin_writer now treats
  BrokenResourceError and ConnectionError like ClosedResourceError.

Windows fixes, by analysis (CI must validate): FallbackProcess.wait()
ran Popen.wait() in a non-cancellable worker thread, so the timeout
around the grace period could never fire and shutdown could hang
indefinitely — it now polls cancellably, and returncode reflects death
without requiring a wait. terminate_windows_process_tree dropped its
timeout_seconds parameter, which was documented but never used (Job
Object termination is immediate; the docstring now says so).

Cleanup in the same files: the escalation now lives in one function
with two named timeouts instead of three hardcoded 2.0s; the Job Object
is tracked in a WeakKeyDictionary instead of a monkey-patched private
attribute on anyio's Process; the deprecated terminate_windows_process
is removed (migration.md updated); the always-true CREATE_NO_WINDOW
hasattr dance and a retry path that double-spawned on spawn failure are
gone; the client's JSON parse error handler catches ValueError instead
of Exception.
… set

The stdio client's logic — framing, parse errors, the shutdown
escalation decisions — is now tested in process against a fake process
injected through the spawn seam, and only the properties that are OS
behaviour keep real subprocesses. The in-process tests include
regressions for the shutdown bugs fixed in the previous commit:
cancellation must still run the full shutdown, tree kill must reach
children after the leader has exited, writes racing server death must
surface clean closure, exiting with a server message still undelivered
must be a clean exit, and a server exiting on stdin close must never
be terminated. Each was verified to fail against the previous
implementation.

The real-process set shrinks to: one consolidated tree-kill test driven
through the public stdio_client (a parent that exits instantly on
SIGTERM, a child, and a grandchild — pinning group inheritance, atomic
group kill, and dead-leader robustness in one spawn), the
SIGTERM-ignoring SIGKILL escalation test, a dead-group no-op test, and
exec-failure ENOENT. Grace and kill timeouts are shortened via
monkeypatch in the real tests: the escalation decision is pinned in
process against a shortened grace, the SIGTERM-then-SIGKILL escalation
itself is exercised only by the real escalation test, and the
production constants' values are deliberately unpinned. The stdio
surface drops from ~11s to ~3.5s.

Deleted as subsumed or broken:
- The two tee-based tests: the interaction suite's subprocess e2e pins
  the real-pipe round trip with a real server, and tee could never
  deterministically exercise the chunk-reassembly buffer that the new
  in-process framing test pins (real pipes deliver short messages as
  whole lines).
- test_stdio_client_bad_path: its script only "failed" because
  non-existent-file.py happens to be a Python expression that raises
  NameError; the property (stdout EOF fails initialize with
  CONNECTION_CLOSED) is now pinned in process.
- The basic/early-parent cleanup tests and universal-cleanup /
  graceful-stdin-exit tests, folded into the consolidated tree test and
  the in-process graceful/escalation tests.
- tests/issues/test_1027_win_unreachable_cleanup.py: its lifespan
  cleanup chain is covered by the interaction e2e's clean-exit marker
  plus the lifespan tests, it polled marker files with fixed sleeps,
  and it leaked its child process when an assertion failed mid-test.
  Its only helper module (tests/shared/test_win32_utils.py) goes with
  it. The MCPServer.run("stdio") coverage its subprocess scripts
  provided is replaced by an in-process test that drives run() over
  injected stdio and asserts a real response.

tests/issues/test_552_windows_hang.py now asserts initialize succeeded
instead of swallowing every exception (its handcrafted response also
sent the invalid protocolVersion "1.0", so it errored on every run and
passed anyway), the stdio server tests are bounded by fail_after, and
the elicitation tests that claimed "stdio" in their names run, and now
say they run, over the in-memory transport.
@maxisbey maxisbey force-pushed the maxisbey/deflake-stdio-tests branch from 5067c98 to 3035984 Compare June 3, 2026 20:17
@maxisbey maxisbey marked this pull request as ready for review June 3, 2026 22:55
@maxisbey maxisbey requested a review from Kludex June 3, 2026 22:56
maxisbey added 8 commits June 4, 2026 09:25
New tests/transports/stdio/ suite pinning the client transport's
process-lifecycle contracts against real subprocesses, synchronized
through TCP liveness sockets (kernel FIN/RST and connect-back) with no
sleeps and no timing assertions:

- Both platforms: a well-behaved server exits on stdin close and is
  reaped without escalation; cancellation mid-session still terminates
  the whole server tree; a server that exits mid-session keeps its own
  exit code; server stderr reaches the errlog file; FallbackProcess
  reports death through returncode without wait() and its wait() is
  cancellable (the SelectorEventLoop fallback wraps a plain Popen, so
  these run on every platform with waitid where available).
- POSIX: a gracefully-exited server's background child survives client
  shutdown (the documented policy), proven by an echo round trip plus a
  never-escalated seam; the surviving child's next write to its
  inherited stdout fails with EPIPE once the client has released its
  pipe ends.
- Windows: the same scenario's documented opposite outcome - the child
  is reaped when the Job Object handle closes at shutdown; a
  SelectorEventLoop session engages the FallbackProcess wrapper and
  completes a clean lifecycle; a print()-based server emitting CRLF
  line endings round-trips messages.

Shared helpers live in _liveness.py (extracted from the real-process
section of tests/client/test_stdio.py; consolidating that file onto
this module is follow-up work), with conftest fixtures recording the
spawn and terminate seams and reaping spawn-time process groups on
teardown.
Source and docs (comment/docstring-level; one dead-code removal):
- Correct the shutdown shield comment: a native task.cancel() delivered
  while cleanup is in progress can abort it regardless of count; the
  consumed-at-the-yield arithmetic only covered cancellation that
  initiated teardown.
- Scope the pipe-gated process.wait() claims to asyncio on Python
  3.11+ (3.10 and trio resolve on process exit alone), in the
  _wait_for_process_exit docstring and the migration guide.
- Add sunset notes to the cpython#106749 tracer workarounds (3.11-only;
  revert when 3.11 support is dropped).
- Document stdio_client's raised exceptions (Raises: section).
- Document the Job Object pre-assignment window in create_windows_process
  and at the assignment call site; document the _process_jobs
  WeakKeyDictionary's two load-bearing invariants (PyHANDLE values,
  identity-hashed weakref keys); fix a wrong comment about the
  CloseHandle failure path in _maybe_assign_process_to_job.
- Drop FallbackProcess.stdin_raw/stdout_raw (write-only attributes).

Tests:
- Fix the inert leak canary in the OSError spawn-failure test: the
  pytest.raises ExceptionInfo pinned the leaked streams across
  gc.collect(); drop it before collecting (mutation-verified).
- Correct the EPERM escalation test's pre-fix description (the old code
  fell back to a leader-only kill, leaking other group members - it did
  not hang forever) and trim an unasserted claim from the
  write-after-death docstring.
- Add a CRLF-framed message to the chunk-reframing test, pinning the
  trailing-CR tolerance Windows servers rely on, on every platform.
- Pin the grace wait's returncode read (what reaps the leader's zombie
  on trio) with a read-counting stub, parametrized over asyncio and
  trio; deleting the read fails the test on both backends.
- Add the missing width-justification comment to one fail_after(10.0).
The job-handle-close reap test failed on all windows-latest legs with a
silent 15s accept timeout: the grandchild's connect-back never arrived,
and xdist swallows the server's stderr, so the logs could not say which
link of the spawn chain broke.

Reshape the server to the form the passing cancellation test already
proves on Windows CI: it connects back to the liveness listener itself
after spawning the child, then parks on stdin. The test now accepts two
connections, so a failure distinguishes "server never ran its connect
line" from "child never connected". Capture the server's stderr via
errlog into a temp file and attach it to every failure message, and
wrap the child spawn in a try/except that reports to stderr before
re-raising. The contract is unchanged: both sockets must close after a
graceful exit (server FIN, child killed by the job-handle close), with
returncode 0 and the escalation seam untouched.
The job-reap test still fails on windows-latest with an empty captured
stderr, which proves the server ran its whole script but cannot say
anything about the child. Close the remaining blind spots:

- Route the child's stderr into the server's (which errlog captures),
  so a child that dies at startup leaves its traceback in the failure
  message instead of vanishing into the hidden console.
- Print a child-started marker to stderr first, splitting "child never
  spawned" from "child started but could not connect".
- Report how many of the two liveness connections arrived and which leg
  is missing, instead of one undifferentiated timeout.
- Record when the stdio_client context was entered and include the
  spawn-to-entry split in the failure message, so a stalled spawn is
  distinguishable from a stalled child.

Verified on POSIX by running the same choreography with a deliberately
unreachable child port: the failure message names the missing leg and
quotes the child's ConnectionRefusedError traceback verbatim.
The instrumented CI runs showed the grandchild booted (its startup
marker reached stderr on some legs) and then died tracelessly before it
could connect — no traceback, no SDK escalation, no job-handle close.
The remaining kill mechanism in the chain is the venv launcher layer:
sys.executable in a uv-managed venv is a trampoline that runs the real
interpreter inside its own kill-on-close Job, and that private job
machinery proved fatal to grandchildren on the CI runners.

The server now spawns its child through sys._base_executable, removing
the foreign launcher layer from the grandchild chain. The contract
under test is unchanged: the child still inherits the SDK's Job Object
at CreateProcess and must die when the job handle closes at shutdown.

Also report the child's poll() status after stdin EOF ends the server
(child-rc on stderr, zero cost on the healthy path), so any remaining
failure message names the child's exit status - the discriminator
between "still alive but unreachable" and "killed, by this code".
The grandchild was freezing at interpreter startup for as long as the
server sat in its blocking stdin read: CPython queries fd 0 during
startup, the child's inherited stdin was the server's stdin pipe (a
synchronous file object), and Windows serializes that query behind the
server's pending ReadFile. The child only thawed when the client closed
stdin at shutdown - at which point the job-handle close correctly
reaped it before it could connect. Spawning the child with
stdin=DEVNULL removes the coupling, so it boots immediately and the
kill-on-job-close contract can be observed.

Also reverts the previous commit's base-interpreter bypass: the venv
launcher was not involved (the freeze reproduces identically with the
base interpreter). The same stdin coupling applies to any Windows stdio
server that spawns Python children without redirecting their stdin.
After shutdown closes the child's stdin, the child has to unwind its run
loop, write its clean-exit stderr line, and let coverage's atexit hook
persist the subprocess data file before the stdin-close grace expires.
On a slow Windows runner the production 2s grace was too tight: the
escalation killed the child mid-atexit - after the asserted stderr line,
so the test stayed green - and the silently missing data file failed the
100% coverage gate at 99.95%. Widen the grace to 10s for this one test;
the timeout is not under test here.
Restructure stdio_client to read top-down: spawn the server process before
creating the transport streams (a failed spawn now has nothing to clean up),
extract _parse_line, replace the reader's delivering flag with two sequential
phases (deliver, then drain), name the shutdown sequence as a shutdown()
helper with numbered steps, and rename _shutdown_process_tree to
_stop_server_process so it no longer reads as a synonym of
_terminate_process_tree. Add a ServerProcess alias for the process union.

Use the same cancellation idiom as the other client transports for the
bounded waits and task teardown (move_on_after, tg.cancel_scope.cancel())
instead of deadline-poll loops and dedicated per-task scopes; a
cancel_shielded_checkpoint after each site that can throw through the await
chain keeps coverage accurate on Python 3.11 (python/cpython#106749). Drop
four scheduling checkpoints in exception handlers that nothing depends on.

Compress comments to the load-bearing rationale, anchor the shutdown
sequence to the MCP lifecycle specification, and explain why
function-valued ("()"-prefixed) environment variables are skipped.
- Close the read stream before the writer-flush wait: the reader's stdout
  drain is what unwedges a server blocked writing, so a wedged writer's
  flush can now complete instead of burning the flush window (on trio the
  aborted send also truncated an accepted message mid-line).
- Run the reader's drain phase shielded and on raw bytes: caller
  cancellation no longer skips the drain (which got well-behaved servers
  spuriously tree-killed), and a dying server flushing non-UTF-8 output
  can no longer abort shutdown with a decode error.
- Re-check returncode when the grace wait times out, so a server dying in
  the final poll interval is not spuriously escalated.
- Release the subprocess transport by duck type, not isinstance: uvloop's
  transport is not an asyncio.SubprocessTransport, so the deterministic
  fd release silently no-op'd there. Tolerate PermissionError from the
  close (Python <= 3.12 re-kills a setuid child without catching it).
- Log killpg EPERM at warning, not error: macOS raises EPERM for groups
  containing only unreaped zombies, i.e. a cleanly exited server.
- Test teardown tolerates EPERM from killpg (same macOS zombie case), and
  the interaction test's outer deadline now exceeds the grace period it
  configures. New regression tests pin each behavioral fix.
- Document the terminate_posix_process_tree group-leader requirement and
  the win32 utilities logger rename in the migration guide.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs in this change, but it rewrites the stdio client's shutdown path (cancellation shielding, process-tree termination, Windows Job Object handling) and includes documented breaking behavior changes, so it needs a human reviewer's judgment rather than auto-approval.

Extended reasoning...

Overview

This PR rewrites the shutdown sequence of stdio_client in src/mcp/client/stdio.py, overhauls mcp/os/posix/utilities.py (process-group SIGTERM/SIGKILL escalation, EPERM handling) and mcp/os/win32/utilities.py (Job Object tracking via a weak-keyed map, deterministic handle close, removal of the deprecated terminate_windows_process), updates docs/migration.md, and rebuilds the stdio test suite around an in-process FakeProcess plus a new real-subprocess lifecycle suite under tests/transports/stdio/.

Security risks

No direct security exposure (no auth, crypto, or input-handling surfaces). The risk profile is reliability-shaped: process lifecycle management, signal/Job Object semantics, cancellation shielding, and resource cleanup. A subtle bug here would manifest as leaked or orphaned server processes, hung shutdowns, or platform-specific (Windows/macOS/trio) regressions rather than a vulnerability.

Level of scrutiny

High. The transport shutdown path is production-critical for every stdio-based MCP client, the change is large and concurrency-heavy (cancel scopes, shields, backend differences between asyncio 3.10/3.11+ and trio), it removes a public deprecated API, and it intentionally changes behavior (POSIX no longer kills a gracefully-exited server's surviving children; Windows reaps job survivors deterministically). That child-survival policy change in particular is a design decision a maintainer should explicitly sign off on, not something an automated approval should accept.

Other factors

The automated bug-hunting pass found no issues, the PR description documents per-fix regression tests with claimed 100% coverage and extensive stress runs, and the maintainer's earlier inline comments (docstring style, the timing-based test, the deleted #1027 regression test) appear to have been addressed with follow-up commits. Those are positive signals, but given the scope, platform-specific behavior, and breaking changes, this falls well outside the simple/mechanical category eligible for shadow approval.

maxisbey added 4 commits June 5, 2026 10:13
- Wait for process exit with a plain deadline-polling loop instead of a
  cancel scope: the timeout path no longer throws through the await chain,
  which removes the coverage-resync checkpoint it needed
  (python/cpython#106749) and collapses the double-exit/re-check shape
  into a single while loop
- Extract the reader's shutdown drain (_drain_stdout), the tolerant pipe
  closes (_close_pipe, replacing two pragma-guarded inline blocks), the
  POSIX group-liveness probe (_group_alive), and Job Object handle closing
  (_close_job_handle)
- Add a module docstring telling the shutdown story once and trim comments
  to the constraints they pin

No behavior change intended; the full suite, coverage, and the stdio
stress runs are unchanged.
stdio_client now carries the same AsyncGenerator[TransportStreams, None]
annotation as streamable_http_client, so it formally satisfies the
Transport protocol instead of leaking anyio's concrete memory-stream
types. Also adds the missing -> None on FallbackProcess.__init__ and
retypes the test helper to the ReadStream protocol.
Adopt a uniform comment standard for the stdio transport and its tests:
PEP 257 one-line summaries, docstring bodies only for contracts or for
warning-of-consequences notes (why the obvious alternative is wrong),
inline comments at most two lines, no Args/Returns boilerplate on private
helpers, ASCII punctuation only. Facts guarding non-obvious OS behavior
(killpg/EPERM semantics, the pipe-gated process.wait(), the Windows
stdin-pipe interpreter freeze, CPython gh-106749) are kept in compressed
form; pure narration is deleted.

No code changes outside comments and docstrings.
Every docstring now opens with a single-line summary sentence followed by
a blank line and the body. Restores the useful content that an earlier
slimming pass cut from create_windows_process: the SelectorEventLoop /
NotImplementedError fallback explanation, the Job Object purpose, the
assignment-window caveat, and a Returns section explaining the
Process | FallbackProcess union.

Docstring-only change; no code or behavior changes.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this is a large PR rewriting the stdio client's shutdown path (process-group/Job Object termination, cancellation shielding) with documented breaking changes, so it needs a careful human review rather than an automated approval.

Extended reasoning...

Overview

The PR rewrites the stdio client transport's shutdown sequence in src/mcp/client/stdio.py, the POSIX process-group termination in src/mcp/os/posix/utilities.py, and the Windows Job Object handling in src/mcp/os/win32/utilities.py, and rebuilds the entire stdio test surface (a new fake-process unit suite, a new real-subprocess lifecycle suite under tests/transports/stdio/, plus deletion/rewrite of two issue-regression tests). It also documents two breaking changes in docs/migration.md: removal of the deprecated terminate_windows_process API and a behavioural change where a gracefully-exited server's surviving children are no longer killed on POSIX.

Security risks

No direct security exposure (no auth, crypto, or input-parsing surfaces are touched), but the code does signal handling, killpg on process groups, and Windows Job Object termination — mistakes here manifest as orphaned/leaked server processes or hung shutdowns rather than as injection or data-exposure issues. The EPERM handling around killpg and the new policy of leaving a graceful server's children alive are deliberate behaviour decisions worth a maintainer's sign-off rather than risks per se.

Level of scrutiny

High. This is core, production-critical transport code on every stdio_client user's shutdown path, with platform-divergent semantics (asyncio vs trio, Python 3.10 vs 3.11+, POSIX vs Windows fallback) and intentional breaking changes. The cancellation-shielded shutdown, the returncode-polling exit wait, and the private-attribute walk to close the asyncio subprocess transport are all subtle and not the kind of change a bot should approve without a human weighing the design (especially the child-survival policy change, which is a user-visible behaviour decision, not just a bug fix).

Other factors

The automated bug-hunting pass over the diff surfaced no bugs, the PR claims 100% line+branch coverage with regression tests for each fix, and a human reviewer is already engaged (their inline comments appear addressed). My deferral is about scope and design judgement, not about specific defects found.

@maxisbey maxisbey merged commit bdc48e9 into main Jun 5, 2026
33 checks passed
@maxisbey maxisbey deleted the maxisbey/deflake-stdio-tests branch June 5, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BrokenResourceError race condition in stdio_client cleanup when context exits quickly

2 participants